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MONLIMEAR   THREE   STAGE  LF^ST   SQUARES    POOLING 
OF   CROSS    SECTION   AND  TIME   SERIES    OBSERVATIONS 

by 
l^ale   V/.    Jorgenson    and   Thomas   U.    Stolcer 

l_,       Iiitroduc  t  ion.      The    purpose    of    this    paper    is    to    discuss    the    pooling    of 
cross    section    and    average    time    series    data    by    the    method   of    nonlinear    three 
stage    least    squares    introduced   by   Jorgenson   and  Laffont    (1974). ^      \,e    consider 
applications    of    this   method    to    exact    aggregation   models,    where    there    is    a 
unique    correspondence   between    individual    and   aggregate    behavior.      This 
correspondence   makes   exact   aggregation   models    appropriate    for    the    analysis   of 
individual    data,    average    data,    or    both    in    coi.bina  t  ion.  " 

\lc   consider    observations   on   K    individuals,    indexed   by    k   =    1,    2    ...    Iw    for 
T   time   periods,    indexed   by    t   =    1,    2    ...    T.      V/e    can    represent    the    structural 
form    of    an    exact  aggregation   model    for    the    kth    individual    in    the    tth    time 
period   by: 

^nkt    =   ^kt    Pn^Pf    ^'^'  (n   =    1.    2    ...    N). 

The  observations  y^^j.^  and  Xu^^  vary  over  both  individuals  and  time  periods, 
while  the  vector  of  observations  p^  varies  over  time  periods,  but  is  the  same 
for  all  individuals  in  a  given  time  period.   The  coefficients  P  (p  ,  O)  are 
functions  of  the  observations  p^  and  the  vector  of  L  structural  parameters 

6'  =  ^^1'  ^2  ■•■  ^L^ •   Restrictions  on  the  parameters  are  embodied  in  the 
forms  of  these  functions. 

We  can  write  the  exact  aggregation  model  for  the  kth  individual  in  vector 
form : 

yj.,.  =  (I.,  »  Xj^^)  p(p^,  e),  (1) 
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where   y         is    a    vector    of   N   observations,    fiip^,    ©)     is    a    vector    of   N    coeffi- 
cients,   and   I.     is   tlie    identity   matrix  of   order   N.      By    averaging    the   model    (1) 
over    all     individuals    for    each    time    period,    we    obtain    the    structural    form    of 
the    eact   aggregation  model    for    averaged   data. 


^t   =    (^N   ®   ^P    P^i't'    ^^ 


(2) 


where  y   and  x'  are  vectors  of  M  observations  on  averages  of  y,  .  and  x,'   over 
all  individuals. 

The  models  for  individual  cross  section  and  average  time  series  observa- 
tions contain  the  same  parameter  vector  9  and  the  same  coefficient  vector 
P(p^,  ©) .  This  reflects  the  correspondence  between  individual  and  aggregate 
behavior  that  characterizes  exact  aggregation  models.  The  forms  of  the  indivi- 
dual and  aggregate  i.iodel  (1)  and  (2)  are  necessary  and  sufficient  for  exact 
aggregation,  provided  that  the  population  distribution  of  x    is  unres- 
tricted. ^ 

As  an  example  of  exact  aggregation  models  we  first  consider  the  linear 
model  that  underlies  previous  discussions  of  pooling  cross  section  and  time 
series  data : ^ 

^nkt  =  Pt  «ln  '    Kt   «2n  '  (n  =  1.  2  ...  N)  . 

where  G,   and  0^   are  vectors  of  parameters.   In  this  example  the  vector  of 
In      Z  n 

parameters  0  of  models  (1)  and  (2)  includes  the  elements  of  0    and  0t   (n  = 

^  In      zn 

1,  2  ...  N)  .   The  vector  of  coefficients  (i  (p    e)  '  is  (p^  0j,j  ,  0oj,)  and  the 

vector  of  observations  x/   is  (1   z'  ) 

kt  "  ^  '   kt   * 

Deciand  analysis  provides  many  examples  of  nonlinear  exact  aggregation 
models.  In  each  of  these  examples  the  theory  of  consumer  beliavior  implies 
constraints  on  the  parameters  of  the  model  that  are  incorporated  through  the 
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form    of    the    coefficients    P^^Cp^,    e)     (n    =    1.    2    ...    N).    Demand    systems    generated 
by    the   Gorman    polar    form    of    the    indirect    utility    function    are    nonlinear    exact 
aggregation   models.       Specific    examples    include    the    linear    expenditure    system 
introduced    by    Klein   and   Rubin    (1947-1948)    and    implemented    by    Stone    (1954),    the 
S-branch   utility    tree    of  JJrown   and  Helen    (1972),    and    the    generalization   of    the 
S-branch   utility    tree    of  Dlackorby,    Boyce,    and   Russell    (1978). 

As    an    illustration,     the    linear    expenditure    system    can    be    v/ritten    in    exact 
aggregation    form    as    follows: 


'nkt    =    (P„t    -^n    -    ^,    ^  "^j    Pjt^    ^   \    ''kt    • 


(n   =    1,    2    ...    N). 


where   y    n*.    denotes    expenditure    on    the    nth    commodity    by    the    kth    individual    in 
period    t    and   p^^^    is    the    price    of    this    comi.iodity    (n   =    1,    2    ...    N) ;    M,        is    total 
expenditure    on    all    coiianodities.      The   vector    of   parameters  9    includes    the 
parameters   b^^    and   c^^    (n   =    1,    2    ...    N)  ,    the   vector    of    coefficients    n„(P(,    «)  ' 
is    (Pjjj.    c^   -    bjj    1   c-    P;t;'^ii^    *"''    *^^^    vector    of    observations    x'       is    (1,    ''],.)• 

More    complex   nonlinear    exact    aggregation   models    liave    recently    been    intro- 
duced   by    Deaton    and   Muellbauer    (1980a,     19G0b)    and    by    Jorgenson,    Lau,    and 
Stoker    (1980,    1981,     1982).      The   AIDS   models    of    Deaton    and   Duellbauer    can   be 
wri  tten: 

^nkt    =    (a„    .    I  c„j     in   p.^)M,^    .   ^^7^7^'^"   =    1.    2    ...    N). 
where  Yj^j^^,    '1^*,    and  p   ^    are    defined   as    in   the    linear    expenditure    system    and: 
InP^    =    la.     in    p .  ^    ^   I"  I  I  c„  .     Inp^^    '     In    p^^. 

is   a    price    index.      The   vector    of   parameters  d    includes    the    parameters 

*t,>    b„,    c„  •    (n,    j    =    1,    2    ...    N),    the   vector    of   coefficients    B    (p.,    O)  '    is 
nnnj*^  nt 
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(a      +    1  c    .     In    p.^.    :; ;;")    and    the    vector    of    observations    x,'       is 

n       nj      Jt'  In  P  kt 

The  translog  model  of  Jorgenson,  Lau  and  Stoker  can  be  represented  in  the 
form : 

a   +  I  b    In  p.  .          b  ,.                  b^'   n,  ^  A  ,  , 
y     _  /_a 'li l-Ll)  „    _  Ul   m    .   m    .5-  _ns kt skt 

^nkt  -  ^     D(p^)     '   "kt   n(p^)  ■  kt  ^°  '  kt  ""  ^        n(p^) 

(n  =  1,  2  ...  M), 

where  y^j,^,  Mj.^  and  p^^  are  defined  as  above,  A  ,  ^  (s  =  1,  2  ...  S)  represents 
demographic  characteristics  such  as  family  size,  age  of  head  of  household,  and 
so  on,  and: 


D(p^)  =  -1  +  I  b,jj  In  p.j. 


In  this  example  the  vector  ©  consists  of  the  parameters  a    b  .   b     b''   (n, 

n'   nj  '   ;:j'   ns 

j  =  1,  2  ...  N;  s  =  1,  2  ...  S) ,  the  vector  of  coefficients  P  (p  ,  G) '  is 

a„  +  lb.  In  p..     b,,.      b„,      b„-        b^e 
^      ^) '    ^    -    ^    '    ^    ••■    ^'    ^""  ''^  ^^'^'"^  °'  °'"^- 

vations  x-^  is  (M^^.  H^^  In  V.^^.    Mj^^  A^^^.    M,^^  A^j^,  ...  M,^,  A^.^^y 

In  this  paper  we  focus  on  the  implications  of  nonlinearity  for  the  pool- 
ing of  cross  section  and  average  time  series  data.   In  Section  2  we  consider 
the  stochastic  specification  of  exact  aggregation  models  (1)  and  (2).   In  Sec- 
tion 3  we  present  and  characterize  the  nonlinear  three  stage  least  squares 
estimator  for  pooled  time  series  and  cross  section  observations.   In  Section  4 
we  discuss  hypothesis  testing  and  in  Section  5  we  consider  estimation  subject 
to  inequality  constraints.   V/e  close  with  a  brief  summary  of  the  results  and  a 
discussion  of  applications. 

2_.   Stochast  ic  ?'>i)cc  if  ic  at  ion.   V/e  begin  by  considering  average 
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observations  for  T  time  periods  and  a  single  cross  section  of  K  inJividual 
observations.  We  assune  that  the  observations  are  generated  by  exact  a^^gre^a- 
tion  models  (1)  and  (2)  with  additive  disturbance  terms.   Given  the  stochastic 
specification  of  the  disturbance  terms,  the  observations  must  be  transformed 
to  obtain  disturbances  that  are  honoscedastic  and  uncorrelated  across  observa- 
tions. 

For  pooling  of  cross  section  and  average  time  series  data  the  transforma- 
tion of  observations  to  obtain  homoscedast ic  and  uncorrelated  disturbances  can 
be  divided  into  two  steps.   The  first  step  separates  the  data  sets  by 
transforming  the  average  data  so  that  time  series  disturbances  are  uncorre- 
lated with  cross  section  disturbances.   The  second  iiej)  transforms  the  result- 
ing data  sets  to  a  form  where  disturbances  in  each  data  set  are  hoMOScedast ic 
and  uncorrelated.   V/e  present  the  transformation  for  the  first  step  expli- 
citly, indicating  the  features  of  this  transformation  that  result  in  increased 
efficiency.   Tlie  second  step  involves  standard  techniques  for  transformation, 
which  we  illustrate  by  example. 

We  assume  that  individual  observations  are  generated  by  the  exact  aggre- 
gation model  (1)  with  an  additive  random  component,  say  e   . 

^kt  =  (In  ®  ^kt)  P(Pf  ®)  ^  =kf  (!') 

We  assume  that  the  disturbance  term  e^,^  is  distributed  with  mean  zero  and  is 
uncorrelated  across  individuals,  so  that: 

^^'kt  ^k't')  =0.  k  it  k'. 

Any  systematic  correlation  among  individuals  is  assumed  to  be  captured  by 
selection  of  the  variables  x^^^.   The  disturbance  term  e^^    is  assumed  to  have 
variance  Q      and  time  series  covariance  structure  E(e,   e,'  , )  =  C   ,  !f  .   A 
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wide  variety  of  alternative  time  series  structures  for  e    can  be  represented 
by  choosing  an  appropriate  form  for  the  matrix  C    , 

We  could  obtain  a  stochastic  version  of  the  exact  aggregation  model  (2) 
by  averaging  the  individual  observations  in  (1')  for  each  time  period.   This 
would  be  the  appropriate  procedure  if  the  average  data  were  constructed  by 
averaging  the  individual  observations.   However,  we  must  allow  for  alternative 
methods  for  constructing  the  aggregate  data.   In  demand  analysis,  for  example, 
data  on  aggregate  personal  consumption  expenditures  are  obtained  from  produc- 
tion accounts  for  the  economy  as  a  whole  rather  than  by  direct  observation  of 
quantities  consumed  by  the  entire  population  of  individual  households. 

To  allow  for  differences  in  methods  of  construction  of  the  individual  and 
aggregate  data  we  introduce  an  additive  random  component  V   into  the  exact 
aggregation  model  (2)  for  each  time  period.   The  model  relating  the  averaged 
data  y  ^q   ^   anj  p   jj  then: 

y^  =  (Ifj  8  ^[)    P(Pt,  e)  +  Uj,  (2') 

where  u   =  \)  +  e.  and  e^^  is  a  vector  of  N  averaged  disturbances  (e,  ) .   Tlie 

stochastic  term  ^)   is  assumed  to  be  distributed  independently  of  ej^^  with  mean 

ft  '       1 1 ' 

zero,  variance  J-K  »  and  time  series  covariance  structure  E(\)   \)  , )  =  n\    for 

t  ^  t'.   To  accommodate  a  variety  of  time  series  covariance  structures  for  u 

we  have: 


E(u   ii')=a     +—  r    o 
t  \'>        "\)     K  ^tt'  ^e- 

In  order  to  present  methods  for  pooling  cross  section  and  time  series 
data  we  consider  a  sample  of  K'  individual  observations.   We  can  "stack"  the 
equations  (1')  to  obtain: 

Y  =  (I^j  9   X)  p(pj.  .  9)  +  e,  (3) 
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where   Y    is    the   vector    of    observations    (y  .       ),    X   is    the   matrix  with 


nkt 


rows 


{x,'   }  and  e  is  the  vector  of  disturbances  with  mean  zero  and  covarianco 
kt 
o 

matrix  H  9   I,.,.   Similarly,  we  can  represent  the  equations  (2')  in  the  form: 

6  K 


Y  =  f(e)  +  u. 


(4) 


where   Y    is    the   vector    of    averaged   observations    {y    } , 


f(e)  = 


xj    p(Pi    .6) 
I^   p(P2    .e) 


and  u  is  the  vector  of  disturbances. 

The  first  step  in  the  transformation  of  observations  eliminates  the 
correlation  between  of  e  and  u 


K' 


E(u   e,  ')  =  TT-  C    n      , 
t  ^kt  '    K   tt   ''e  ' 


(k  =  1.  2  ...  K';  t  =  1.  2  ...  T).  (5) 


This    correlation    is   removed   by    a   nonsingular    transformation   of    (3)    and    (4), 
which    is   equivalent    to   replacing  y      ,    7     and   u^^    in    (2')    by: 


yo 


=  Y.   _ 


'i-  C  v 

1^  ^tt      Yes. 


(6) 


t 


X     -  i^  c 


K     "tt      ^es' 
o 


u"  =   u 


t       I 


III  C 

::     tt     ^cs' 

o 


where  y    ,  x        and  e    denote  the  cross  section  averages  of  y,    ,  x,    and 

o      o 

Ej.   ,   The  resulting  disturbances  u°  are  now  uncorrelated  with  e^^^   (k  =  1,  2 
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K').  but  have  a  more  coraplicated  time  series  structure  than  the  original 
disturbances : 

E(u-  u«:)  =  9^    ^   ^  C^,.  P.,  -  K'  [C^^^.  .  C^,^  -  II  "e  •  ^"'^ 
The  second  step  in  the  transformation  of  observations  is  to  apply  a  non- 
singular  transform  to  the  average  data  in  (4)  to  obtain  disturbances  that  are 
homoscedastic  and  uncorrelated.  We  illustrate  this  transformation  below  by 
example.   We  assume  that  the  transformation  has  been  performed,  altering  the 
model  (4)  to: 

—  (8) 

Y*  =  f*(e*)  +  u*  , 

where    u*    is   distributed  with  mean   zero   and  variance   9.^^  A   I^.      For    estimation, 
we    stack   the    equation    systems    (3)    and    (8): 

(  9) 
Y  =  t»   (e«)    +  U. 

where  U'  =  (e'  .  u*').  which  is  distributed  with  mean  zero  and  variance: 

-  -  u.,  =  pr^;,j . 

The  implementation  of  the  transformations  described  above  requires  con- 
sistent estimates  of  the  variances  and  covariances  n^.  C^^,.  9^        it.f    =  1.  2 
...  T).  In  general,  these  estimates  require  specific  models  of  the  processes 
generating  the  disturbances.   The  purpose  of  the  transformations  is  to  assure 
efficiency  in  estimation.   Equation  (2')  shows  that  the  contribution  of  the 
individual  errors  e^^^  to  the  covariance  structure  of  u^.  is  likely  to  be  negli- 
gible unless  the  matrices  £2*^* '  are  the  same  order  of  magnitude  as  ^^C^^.  o^, 
where  K  is  population  size.   The  benefits  of  performing  the  transformation  (6) 
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depend  on  the  size  of  the  cross  section  relative  to  the  population.   In  many 
applications,  K'/K  will  be  extremely  small  so  that  the  transformation  (6) 
leaves  the  observations  unaffected.   Typical  numbers  for  an  analysis  of  U.S. 
household  demand  behavior  are  K'  =  10,000  and  K   =70  million.   Consequently, 
only  when  the  cross  section  sample  size  is  of  the  same  order  of  magnitude  as 
the  size  of  the  population  will  the  correction  yield  significant  benefits; 
otherwise  it  can  be  ignored. 

The  following  examples  illustrate  different  error  structures,  where  we 
assume  K'/K  is  very  small.   We  take  C^^,  =  q    ,  t^t  r  ^  f^r  simplicity,  defer- 
ring further  discussion  of  this  time  series  structure  until  we  have  presented 
the  examples.   In  Examples  1  and  2  we  take  flK^   =  0  for  t  ^  t'. 

Example  1  (Random  Individual  Errors):   Suppose  that  \}  arises  because  of 
an  additional  random  component  ^       at  the  individual  level,  which  is  dis- 
tributed with  mean  0  and  variance  9...,    so  that  \).  =  I  \),  /K.   Then  u   = 

V  t  K  t  t 

I(\j.    +    ej.j.)/K.    with: 


E(u^    up    =  0 

t#f    . 

The    second    stage    transformation    is   just    a    grouping   correction,    with   u     of 


(8)    given    as    u      = 'X/IT'u^ ,    with    Q    ^    =  fi.,   + 
t  ^         t  u*  V 


9.    .' 

£ 


Example  2    (Common  Time   Effect):      Suppose    that    ^     represents    a    conmion   dis- 
turbance   in   the   aggregate    data   with   n^*^   =  fi      for    all    t.      In    practice    one 
will    usually    encounter   K  SI      >    >    n    ,    so    that    u     =    \)     for    purposes   of    esti- 
mation.     Here    no    second   stage    correction    is   necessary,    with 

Example   3    (Autocorrelated  Conmion  Time   Effect):      Suppose    that   Example  2    is 
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modified    to    \)^    =   y    ^      ,    +   w    ,    where    lo.     is    distributed   with   nean   zero, 

variance   fi      and   uncorrelated   over    time,    with   K  fl      >    >   0    .      llien 
w  we 

1 1  2 

n,.     =   Ji    /I   -   Y    •      As    above,    the    contribution    of    "Le.^/K    to    u      is    neeliai- 
V  (1)  '  kt  t  °      " 

ble,    so    that    u     =   \)  .      The    second   stage    correction    is   now   quasi-first 
differencing,    replacing  y      and    x      by    y      -    y   y      .    and    x      -    y    x  (with 

the    standard   adjustment    to   the    first    observation).      Of    course,    u      =  u 
and   CI   ^   =  n      in   this    case. 

U*  (1) 

Now  suppose  that  C   ,  ^  0  so  that  we  have  a  nontrivial  time  series  corre- 
lation structure  for  e      In  Examples  2  and  3  above,  the  effect  of  C   ,  jtQ 

Ik.  L  L  L 

would   be    negligible,    due    to    the   unimportance    of   X  Eh^/K    in   u.  .         In  Example    1, 

K  t  t 

however,    the    time    series    structure    is   potentially    important,    since  '\Ji;   u     will 

have    the    same    time    series    covariance    structure    as    e,  ^    and  would   require    con- 

kt  ^ 

sideration    in   the    second   stage    of    the    transformation   of    observations. 

Example  3  illustrates  the  cost  of  pooling  with  very  general  error  struc- 
tures. In  particular.  Example  3,  the  parameter  y  is  best  relabeled  as  a  com- 
ponent   of  ©,    with    the    transformed  error    covariance    structure    now   determined   by 

^-      and  fil    »   =  ,Q    .      The    treatment    of    autocorrelation  will    involve    augmenting   the 
e  u*  w  DO 

list    of   parameters    to   be    estimated  with    the    remaining   error    structure    charac- 
terized  by   Q      and   Q   ^.      This  modeling   approach    is    standard   practice    in   time 
series   analysis.    Consequently,     in   Section   3    we   discuss    only    the    consistent 

estimation   of    the    parameters    9.      and   CI   *,    which   we   will    regard    as    positive 

e      u* 

definite  but  otherwise  unrestricted. 

Before  discussing  the  additional  assumptions  required  for  estimation  of 
the  complete  model,  we  introduce  instrumental  variables.   It  is  often 
appropriate  to  treat  the  variables  x^^    and  p^^  as  endogenous  for  the  individual 
observations,  the  aggregate  observations,  or  both.   Tliis  can  occur  when  the 
model  is  a  simultaneous  equations  model  in  exact  aggregation  form  or  part  of  a 
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.arger  system  of  simultaneous  equations.   For  example,  in  demand  analysis 
observations  on  prices  can  reflect  both  supply  and  demand  influences,  requir- 
ing aggregate  instruments.  Alternatively,  in  a  study  of  savings,  errors  in 
variables  may  necessitate  instruments  for  the  individual  data,  while  in  the 
average  data  such  errors  may  be  negligible. 

We  assxune  that  there  are  vectors  of  observations  on  instruiaental  vari- 
ables,  say  t^j  ).   Denote  as  Z,  and  Z  the  matrices  with  rows  z,   and  z' 
respectively,  and  as  Z  the  matrix: 


\\^0      0 


Finally,  we  must  introduce  regularity  assumptions  in  order  to  character- 
ize the  NL3SLo  estimator.   We  include  these  in  the  Appendix.   The  assumptions 
are  that  the  coefficient  functions  P(p   e)  are  twice  continuously  differenti- 
ablo  in  the  components  of  fl,  that  the  moment  matrices  defining  the  NL3SLS 
objective  function  converge  to  stable,  well  behaved  limits,  and  that  the 
parameter  vector  ©  is  identified.   We  collect  all  components  of  ^    identified 
in  the  cross  section  in  a  set  §^ ,  all  parameters  identified  in  the  time  series 
in  a  set  9,  and  all  the  remaining  parameters  in  a  set  9  . 

3..   The  Nonlinear  Three  Stage  Least  Squares  Estimator.  The  t-ILSSLS  estima- 
tor 6  of  ©*  is  found  as  the  value  of  6  which  minimizes: 


S(e)  =  il  -  (i  (9))  '  [t^   %   Z(Z'Z)^Z']  lY  -  4(9)). 


•  V  \  ^7  ' 


(11) 


where : 


E    T. 


u*   T, 
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is  a  consistent  estimator  of  I  as  IC  ,  T  — ><».   The  objective  function  S(e)  can 
be  written  more  explicitly  as: 

s(e)  =  s^(e)  +  s(e),  (12) 


with; 


s^(e)=(Y-(i  8  X)  p  (p^  ,e))'[a^  ®  z^(z^z^)   V.^](y-(i^  s  x)p(p  ,  e)). 

o  *^         o 

s(e)  =  (Y»  -  f*  (G))'[rri  ®z  (Z'Z)"^z']  (Y*  -  f*(e)). 

u* 
where   S    (9)    and   S(9)    are   IJL3SLS   objective    functions    for    the    cross    section   and 

Q  A 

average  models  individually.    Obviously,  the  function  S  (9)  could  be  minim- 
ized to  estimate  the  elements  of  8  ,  for  fixed  values  of  the  remaining  parame- 
ters; similarly,  S(9)  could  be  minimized  to  estimate  the  elements  of  ©.   If 
9   =  ©  and  ©   =  ^,  then  all  parameters  could  be  estimated  from  either  data 
set.   Minimizing  (11)  constrains  the  estimated  values  from  cross  section  and 

time  series  data  sets  to  be  equal,  which  results  in  efficiency  gains. 

0  " 

Note  that  the  function  S  (9)  can  be  evaluated  using  only  9.      and  the 

e 

moment  matrices  Z   X  .Z„Z,  and  (I   8  Z„)  Y.   Thus  for  estimating  9  or  other 
more  restricted  parameterizations  of  P(p  ,  9),  only  one  pass  through  the  cross 
section  data  is  required  to  construct  these  moments.   This  computational  sim- 
plification results  from  exact  aggregation. 

The  estimation  procedure  consists  of  three  steps:   First,  find  consistent 
estimators  of  9.^    and  n^,;  second,  minimize  (11)  to  obtain  9;  third,  calculate 
the  asymptotic  covariance  matrix  of  9.   If  %      is  not  empty,  then  we  cannot 
improve  upon  previous  suggestions  in  the  literature  for  finding  consistent 
estimators  of  9.      anj  o   .  for  example.  Gallant  (1977)  suggests  estimating  each 
equation  of  the  model  by  NL2a^S.   This  involves  pooling  both  data  sources  on  a 
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single    equation   basis,    forming   A      as    the    estimated    residual    covariance    from 
the    cross    section    data,    and   forming  ft^^   as    the    estimated   residual    covariance 
from    the    transformed   average    time    series    data. 

The  more    usual    situation    is    that   ©^    is   empty,    which    suggests   a    simpler 

procedure.      First,    obtain   consistent    estimates   of    p(p^    ,    O)    by    linear   2SLS 
^  o 

estimation  of    each   equation   using   the    cross    section   data.      Tlie    estimated    resi- 
dual   covariance   matrix  P.^   provides   a    consistent    estimator    of   P-^   even    if   ^      is 
not    empty.      Using   the    consistent    estimators   of    (i(p^    ,    6).    solve    for    consistent 

0 

estimates   of    the    elements    of  ©« ,    say  6°.      Holding   these    parameters    fixed   at 
^0.    estimate    the    remaining   parameters   of  e  by    applying  NL2SLS   to   each   equation 
of    the   model    or   ML3SLS    to   the    system   as   a   whole,    using   only    the    time    series 
data.      The    estimated   covariance   matrix   of    the   tJL2SLS   residuals,    ft^^  provides   a 
consistent    estimator    of   0^^.       In   addition,    this   procedure    usually    produces 
good    starting   values    for   G    to    use    in   minimizing    (11). 

The   objective    function    (11)    can   be   minimized   using   a   variety    of   well 
known   computational    methods.      A   convenient   method   that    illustrates   pooling 
cross    section   and   time    series    data    is    the   Gauss-Newton   process.^    To   discuss 
this  method  we    require    the    following  notation:    Let  l?^j(e)    and  «l>(e)    denote    the 
matrices : 


BQ(e)    = 


Doi(e) 
r.02(e) 


%(«> 


.      4(6)    = 


4^(9) 
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and  4.    (0)    is    the   matrix  with    elements    {  \[K   x^    P^    (P^.    ©) ) • 

The  Gauss-Newton   process    is   an   iterative   procedure    for    finding  6    from   an 

A 

initial  value  G    At  the  ith  iteration,  the  current  value  ©.  is  updated  to 


14 


d 


.^,    =   <9.    +   A  O.    by    first    linearizing    the    system    (9)    with    respect    to   B    as: 
1  +  1  11-' 

Y   -    (I^    9  X)    p    (p,     ,e.)    =    (I      «   X)    D„    (e.)    A9.    +    e    ,  (13) 

q  '^      "tQ       1  q  Oil 

y*   -    f*    (6.)=   (J)    (9.  )    AB.    +    u*. 
1  11 

We    then    apply   Zcllner    and  Theil's    (1962)     linear    three    stage    least    squares 
uethod    to    the   uodel    (13),    obtaining: 

Ae.       =      (M        4-    M    )~^    (M        +    M    )    .  (14) 

1  xo  X  eo  u 

where  : 


M^„  =  Bo(e.)'(n;^  xo  x'z^u^-z^)-'  z^x)  b^  (e.) 
IT      =  4  (e.) '(a"l  e  z(Z'Z)~^z')  <ji  (9.) 

X  1  U*  1 

M^      =   4    (9.)'(a~i   8Z(Z'Z)~^J')    Y*   -f*    (9.))     . 

A  A 

Convergence    to  9    is   achieved  when   A  9.    becoties    sufficiently    small.    Following 
Hartley    (1961)    we    check  whether    S(6.    -i-  A  9.)<    S(9.);    if    not,    we    shrink  A  9^    by 

A  A  i\ 

forming  9.,,    =9.    +  A  9/2.      V/e    continue    until    improvement    in   S    is    found, 
1  +  1  1  1 

where    a    new    iteration    is   performed;    alternatively,    if    the    current    increment 
falls   below   a    convergence    criterion,    we   have    found   the   minimum. 

Under    our    assumptions    the   NL3SLS   estimator  9   is    consistent    for  9*   as   K' , 
T  — >    ",    and   asymptotically    normal    with    asymptotic    covariance   matrix: 


AVAll    (6)    =   (M*      +  M*)~^    .  (15) 

xo  X 

where    the   moment   matrices    are    evaluated   at    the    true   values   f!      and   CI      .      Tlie 

e  u 

precise  form  of  the  limiting  normal  distribution  depends  on  the  way  that  K' 
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and  T  approach    infinity;    however,    a   consistent    estimator    of   AVAR    (fl)    js    given 

by    (M        -t-  II    )         in  any    case. 
■'         xo  x'  ^ 

Closer    inspection    of    (14)     indicates    the    relationship    of    K'LSSLS    to    linear 

0—1 

pooling   estimators;    for    example,    ii   %     =  ©,    ©     =   i> ,    then: 


A  e.  =  (r:      i-  ii  )   ^  (m      a  e^  +  m    ag.) 

1  xo  X  xo  1  XI 

A 

where   AG.    and   A6.    are    the   Gauss-Newton    increments    from  minimizing   S    (9)    and 
11 

S(Q)    respectively.      Thus,    A  0.    is   just    a   matrix  weighted   average    of    the    indi- 
vidual   increments. 

Second,    if    the    cross    section   data    are    exogenous,    then  Z„    =  X,    and   both 
^'xo    ^^^  ^'eo    '^^^    ^®    evaluated    using    the   moments    X'X    and    (I      8  X)'    Y.       In    this 

case    one    can   obtain  it      from    the    cross    section   residuals   of    each   equation 

e 

estimated  by  OLS. 

Third,  additional  cross  section  data  sets  can  be  incorporated  in  a 
straightforward  manner.   If  an  additional  cross  section  is  available  for  time 
tj,  (or  tQ ,  for  that  matter)  with  data  Y, ,  X,  and  Z,,  then  n,(0),  M  ,  and  M  , 
are  formed  as  above,  and  fl^^  and  M^^  enter  additively  into  the  first  and 

second  term  of  (14).   The  proper  correction  (7)  must  be  applied  to  the  aggre- 

1 2 
gate  data  series. ^^   In  this  way  all  of  the  available  cross  section  informa- 
tion can  be  used  in  estimating  the  vector  of  parameters  6. 


4_.   Parametric  Hypothesis  Te s t s .   Statistical  hypotheses  take  the  form  6 
=  g(p),  where  p  is  a  vector  with  dimensionality  R  less  than  that  of  ft,  say  L. 
Our  irterest  is  in  testing  the  hypothesis  that  0  =  g(p)  against  the  alterna- 
tive G  ^  s(p).   For  this  task,  we  require  two  additional  assumptions,  listed 
in  the  Appendix,  which  indicate  that  p  is  identified  and  that  the  disturbances 
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Ei.^    and   u      are    normally    distributed, 
let  t 

The    test    statistic   of    interest    is   found   as   follows:    Let   S^(p)    denote    the 
objective   function: 

Sj.(p)    =   (Y  -   <k{g(p)))  '[t~^   8  Z(Z'Z)'^Z'](Y  -   i*(g(p))).  (16) 

Denote  by  p  the  value  of  p  which  minimizes  S  (p).   Under  our  assumptions  Gal- 
lant and  Jorgenson  (1979)  have  shown  that  the  statistic, 

r  =  S^(p)  -  S(e).  (17) 

is  asymptotically  distributed  as  chi-square  with  L  -  R  degrees  of  freedom 
under  the  null  hypothesis.   The  appropriate  test  statistic  is  provided  by  x. 

The  minimization  of  S  ( p)  to  find  p  is  analogous  to  the  procedure  for 
finding  0,  and  requires  only  moment  matrices  from  the  cross  section.   Although 
any  consistent  estimator  of  I  can  be  used  in  evaluating  (16),  the  monotonicity 
condition  S  (p)  -  S(0)  >  0  will  be  guaranteed  only  if  the  sane  I  is  used  to 

A  A 

evaluate  both  S   and  S.   Thus,  the  original  consistent  estimates  9.^    and  n^^ 
used  in  estimating  6  should  be  used  in  finding  estimators  for  restricted  ver- 
sions of  the  model. 

5_.   Estimation  Subject  to  Inequality  Restrictions.   The  final  topic  we 
consider  is  the  estimation  of  the  parameter  ©  subject  to  inequality  restric- 
tions.  For  example,  an  integrable  demand  system  must  obey  the  condition  that 
the  Slutsky  matrix  of  compensated  price  derivatives  is  negative  semi-definite. 
The  unnoustrained  estimator  6  need  not  obey  these  restrictions  for  finite  sam- 
ples; thus,  it  may  be  desirable  to  impose  them.   We  represent  such  restric- 
tions formally  as: 

(»  (e)  >  0  ,  (m  =  1,  2  ...  M')  .      (18) 

m     — 
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where    we    assurne    ^       to    be    twice    continuously    dif  f  erent  iable    in   each    component 

of  e. 

The  inequality  constrained  estimator  9  minimizes  SO)  subject  to  the  con- 
straints (18).   This  estimator  corresponds  to  a  saddlepoint  of  the  Lagrangian 
function : 

L  =  s(e)  +  \'i>  ,  (19) 

where  X  is  a  vector  of  M'  Lagrange  multipliers  and  0  is  the  W    vector  of  con- 
straint functions.   The  Iluhn- Tucker  ( 195  1)  Gonditions  for  a  saddlepoint  of 
this  Lagraugian  are: 

^qL  =  Aq  s(fi)   +  A.'  («(e))  =  0  , 

and   the    complementary    slackness    condition: 

X.'^   =   0    ,    X    >.  0    , 

a  9. 
where  (1(9)  is  the  matrix  with  elements  (   ^  )  . 

J 
To  obtain  the  estimator  ^  we  begin  by  linearizing  the  model  as  in  (13). 
Next,  we  linearize  the  constraints  as: 

*^^n.i)  =  «(e.)  A  e.  +  ^(e.)   . 

A 

A 

where  9.    is    the    current    iteration  value    of    the    unknown   parameters.      We    then 
1  ' 

apply   Liew's    (1976)    inequality    constrained    linear    three    stage    least    squares 
method   to    the    linear   model,    obtaining: 

*  A 

A  4.      =     A  9.      +      (M^^    t   M^)"^   *(9.) •    X*    , 
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where  A  d.  is  given  by  (14)  and  >.*  is  the  solution  of  the  linear  complementar- 
ity problem: 

IS  A  A  * 

9(^  )   (M      +  M  )'^$(e.)  •  X  +  [<P(e.)Ae.  -  piQ.)]'\  =  o.  x  i  o. 

1  xo  X  1  11  1 

A  A 

A  AAA 

Given  9.    that    satisfies    the    constraints    (18),    we    update    to  e.^^   =  e.    +   A      9^ 

A  A  A 

and  check  that  both  S(e.^^)  <  S  (9.).  and  that  0^  ^^i  +  i^  >.  0 .  m  =  1,  2  . .  .  M' . 
If  not,  we  shrink  the  increment  vector  as  before,  until  either  improvement  is 
found  or  the  increment  values  fall  in  absolute  value  below  a  convergence  cri- 
terion.  Tliis  concludes  our  discussion  of  the  NL3SLS  estimator. 

6.   Conclusion  and  Applications.   In  this  paper  we  have  discussed  the 
nonlinear  three  stage  least  squares  method  of  pooling  average  time  series  and 
cross  section  data.   There  are  two  major  advantages  of  this  technique.   The 
first  is  the  identification  of  parameters  and  the  gains  in  efficiency  in  esti- 
mation.  For  example,  by  pooling  average  time  series  and  cross  section  data, 
models  can  be  estimated  that  account  for  a  large  number  of  specific  demo- 
graphic effects  in  consumer  behavior  in  both  microeconomic  and  raacroeconomic 
settings.   Such  effects  are  difficult  to  identify  or  estimate  precisely  using 
aggregate  time  series  data  alone.   Alternatively,  the  effects  of  time  varying 
factors  such  as  price  levels  that  are  constant  across  consumers  in  each  time 
period  may  be  impossible  to  identify  using  only  data  from  a  single  cross  sec- 
tion survey.   Both  effects  can  be  estimated  when  cross  section  observations 
are  pooled  with  average  time  series  observations. 

The  second  major  advantage  of  the  nonlinear  three  stage  least  squares 
technique  is  ease  of  computation.   While  exact  aggregation  models  can  allow 
for  substantial  nonl ineari ties  in  variables  representing  common  influences  on 
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behavior    as  well    as    in   parameters,    cross    section   data    are    employed    in   pooled 
estimation   through  moment   matrices.      These   matrices    can  be    constructed  utiliz- 
ing  only    one    pass    through    each   cross    section   data    source.      Tliis    feature    sub- 
stantially   reduces    the    time   and  expense    of   performing    iterations    to   estimate    a 
nonlinear   model    atri   the    cost    of    estimating   several    restricted  versions   of    the 
same  model    for   hypothesis    testing. 

We   have   applied   the    techniques    described   in   this   paper    to  models   of 
aggregate    consumer   behavior    for    the   United  States.      Models   describing   consumer 
budget    allocation   among  broad   commodity    classes   are    presented   by   Jorgenson. 
Lau  and  Stoker    (1980,    1981,    1982).    These   models   are    estimated    from   annual    time 
series    data    from   1958-1974,    together   with    cross    section   data    from    1972.      Ine- 
quality   constrained  estimation    is   required   to   assure    that    the    resulting   Slut- 
sky   matrices   are    negative    semi-definite. 

A  model    describing   the    allocation   of    total    energy    expenditures    to 
specific   energy    types    is   presented   by    Jorgenson   and   Stoker    (1983).    Tliis  model 
is   estimated  using   annual    time    series   average    data    from    1958-1978.    together 
with   five    cross    section   data    bases.      Parametric   hypothesis    tests    are    performed 
using   the    test    for    separability    of   preferences    and   the    possibility    of    struc- 
tural   change.      Finally,    Jorgenson,    Slesnick.    and   Stoker    (1983)    have    presented 
models   of    two    stage    budgeting.      At    the    first    stage    the    consumer   budget    is 
allocated   between   energy    and  nonenergy    commodities.      At    the    second    stage    the 
energy   budget    is   allocated   among   types   of    energy. 
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APPENDIX:  TECHNICAL  ASSUMPTIONS 

Below  we  list  the  assumptions  required  to  establish  consistency  and 
asymptotic  normality  for  the  NL3SLS  estimator.   Assumptions  1-5  follow  Oallaut 
(1977)  and  assumptions  6-7  follow  Jorgenson  and  Gallant  (1979). 

Assumpt  ion  _1:  The  parameter  space  of  €,  say  %,    is  compact,  with  the  true 
value  an  interior  point. 

Assumption  2:  The  components  of  p^  (p^,  e)  (n  =  1,  2  ...  N)  are  twice 

continuously  differential  be  in  6.. 

J 

For  the  next  two  assumptions,  we  use  the  notation  p-'  (p    9)  and 
P  ^    ^Pf,    6)  to  refer  to  the  vectors: 

J     J       J 

pij  (^    ,,  _(i!AL  jLh^      iiAN.), 

Pq  ^Pf  «'  -  ^ae,  ae.'  ae,  a  e.  •"  se.   ae.  ' 
1   J   1   J     1   J 

where    p^^    is   the  mth    component    of    p^    (p^.,    O)  . 

Assumption   3A    (Cross    Section).      The  matrix  jj  Z'Z      converges    to   a   positive 
definite   matrix   as  N  — ^°'.      The   Cesaro    sums, 

N  ^  <ynkt     -   ^kt      Pn    <Pt    •    ®)^<yjkt     -   '^kt      Pj    ^Pt    '   «>>    • 
k  o  o  0  o  00 

Rpkt       <ynkt    -    ^kt    Pn    (Pt    •    «))    • 
k  o  o 

Npkt  (^kt  pi  <Pt  •  «)>  ' 

k    o     o       o 
converge  almost  surely  uniformly  in  6  (n,  j  =  1,  2  ...  N) .  The  sums: 
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k  0     0       0 

^P"Pe"  S'kt  (^kt  PJ'  ^Pt  •  «>^'' 

k  o     o        o 


are  bounded  almost  surely  for  all  (n,  j  =  1,  2  ...  N;  s'  =  1,  2  ...  S) ,  where 

^lyt       is  the  sth  component  of  z,   . 
o  o 

Assumption  3B  (Time  Series) .   The  matrix  =^  Z'Z  converges  to  a  positive 
definite  matrix  as  T — >°>.   The  Cesaro  sums: 


f^\  ^^it  -  ^t  K  <Pf  ®)^  <yjt  -  ^t  pj  ^Pf  ®)>' 

Tj\Fr^    ^^nt   -   ^t   Pn    (Pf    «))• 

tJ\|^^  (I;  pi  (p,.  e)). 

converge    almost    surely    in  6    (n,    j   =    1,    2    ...    N) .      The    sums. 


1^ 


-'     al 


fxsupQ  I  z^.^  (x^  p;;  (p^.  e))i. 


1t-_ 


:'  f.ij 


^isup^  I  z^.^  (x^  P^J   (p^.  e))l. 


are  bounded  almost  surely  (n,  j  =  1,  2  —  N) ,  where  z  ,   is  the  sth  component 

of  z 
f 

Assumption  4:  The  matrix: 


lim 


N.T 


N  +  T    xo     X 


is   nonsingular,    where   M        and  ST     are    defined    in   equations    (14)    and    (15) 


xo 
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Assumption  5.:  (Identification).   ©  of  (11)  is  identified  by  the  instru- 
mental variables  z     and  z  ;  that  is,  the  only  solution  of  the  almost  sure 

1  im  i  t  s  : 


Um    kzj^t  (y^^^   -    x'^  P^  (p^  .  e))  =  0  .(n  =  1.  2  ...  N)  . 
N  — f    °°   k    o  o 


(API) 


lim    ^1*1^7  Zj  (y^t  -  x|  P^  (p^,  e))  =  0  .(n  =  1.  2  ...  N)  ,       (AP2) 
T  — >  "   t  ^ 


is  the  true  value  0  * 


Assumption  6:  (Parameter  Restriction).  The  function  g(p)  is  a  twice  con- 
tinuously dif f erent iable  mapping  of  a  compact  set  P  into  the  parameter  space 
©.   There  is  only  one  point  p   in  P  which  satisfies  g(p)  =  0   and  p   is  an 
interior  point  of  P.   The  L  x  Pv  matrix  G(p*)  has  rank  R,  where  the  n,  jtli  ele- 

^  8n 
ment  of  G  is  -         ,    where  g   is  the  nth  component  of  g(p)  and  p.  is  the  j th 
Pj  n  J 

component  of  p. 


Assumption  7:  (Normality).   The  disturbances  e    and  \.    are  normally 
distributed  for  all  k  and  t. 
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Footnotes 

1.  For  detailed  discussion  of  nonlinear  three  stage  least  squares  esti- 
mators, see  Amemiya  (1977),  Gallant  (1977),  and  Gallant  and  Jorgenson  (1979). 

2.  The  correspondence  between  individual  and  aggregate  behavior  is  dis- 
cussed by  Lau  (1977.  19G2)  and  Stoker  (1982b). 

3.  An  alternative  approach  to  aggregation  is  based  on  restrictions  on 
the  distribution  of  the  variables  x^^^.   See,  for  example.  Stoker  (1982a). 

4.  See  for  example.  Balestra  and  Nerlove  (1966),  Kmenta  (1978),  and 
Mundlak  (1978).   Much  of  the  discussion  of  the  linear  model  focuses  on  the 
stochastic  specification  rather  than  the  structural  model;  see,  for  example, 
Amemiya  (1978).   The  literature  on  pooling  cross  section  and  average  time 
series  data  in  a  linear  model  has  been  surveyed  by  Dielman  (1983). 

5.  This  stochastic  specification  is  used  in  an  exact  aggregation  model 
by  Jorgenson,  Lau,  and  Stoker  (1980,  1981,  1982). 

6.  The  exclusion  of  \)^  from  the  cross  section  disturbances  in  Examples  2 

and  3  may  appear  to  be  somewhat  arbitrary.   Suppose  instead  that  ^   +  Ei 

o      o 

represent  the  cross  section  disturbances.   The  \)   can  be  estimated  as  the 

o 

difference  between  the  estimate  of  the  cross  section  constant  term  and  the 

constant  term  applicable  to  the  time  series.   Correlation  between  resulting 

cross  section  and  time  series  disturbances  is  then  due  only  to  the  e     terms, 

o 

so  that  the  effect  of  the  transformation  separating  the  tv/o  data  sets  is 
negl igible. 

7.  This  excludes  the  possibility  that  x   is  subject  to  measurement 
iviot;    aggregate  instruments  would  be  required  to  deal  with  errors  of  measure- 
ment. 

8.  We  assume  that  the  variance  of  the  disturbance,  conditional  on  the 
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instrumental  variables,  is  constant  for  both  cross  section  and  average  time 
series  models.   If  this  assumption  is  relaxed,  efficiency  gains  are  possible 
by  adjusting  the  weighting  matrix  of  equations  (11)  and  (12).   See  White 
(1980a,  1980b.  1982)  and  Hansen  (1982)  for  details. 

9.  The  Gauss-Newton  method  for  systems  of  nonlinear  regression  equations 
is  discussed  by  Malinvaud  (19  80). 

10.  If  the  observations  are  transformed,  the  transformed  data  should  be 
used  here. 

11.  Matrix  weighted  averages  are  discussed  in  Chamberlain  and  Learner 
(1976)  and  Mundlak  (1978),  among  others. 

12.  This  assiunes  that  disturbances  in  different  cross  sections  are 
uncorrel ated,  which  requires  transformations  of  the  average  data  only.   Over- 
lapping cross  sections  require  panel  data  techniques  that  are  beyond  the  scope 
of  this  article. 
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